Abstract: Spam Web page is a website which does not contain any useful information. Spammer will create such spam pages for fun or to increase page rank in turn to generate their revenue. The Spam webpage Detection is one of the top challenges for the search engines. There are two different approaches for the detection of spam web page such as Link and Content based analysis. In this paper, we mainly focus on Content based analysis. We have used parameters such as average length of a word, keyword stuffing, and content of a body, number of stop words, unique count for body and title of page are used to identify spam.
Keywords: Search engine, Web mining, Spam web page, Content based analysis.